Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes

نویسندگان

  • Hyeong Soo Chang
  • Robert Givan
  • Edwin K. P. Chong
چکیده

We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

A Parallel Algorithm for POMDP Solution

Most exact algorithms for solving partially observable Markov decision processes (POMDPs) are based on a form of dynamic programming in which a piecewise-linear and convex representation of the value function is updated at every iteration to more accurately approximate the true value function. However, the process is computationally expensive, thus limiting the practical application of POMDPs i...

متن کامل

Solving Partially Observable Markov Decision Processes by Neural Networks

Partially Observable Markov Decision Processes POMDPs cope with sequential decision processes where an agent tries to maximize or minimize some reward without complete knowledge of the process. These models are of interest for quality control, machine maintenance, reinforcement learning, etc. More generally Monahan 99 has shown that many tasks in partially observable environments can be viewed ...

متن کامل

Sensor Scheduling for Target Tracking Using Approximate Dynamic Programming

To trade off tracking accuracy and interception risk in a multi-sensor multi-target tracking context, we study the sensor-scheduling problem where we aim to assign sensors to observe targets over time. Our problem is formulated as a partially observable Markov decision process, and this formulation is applied to develop a non-myopic sensor-scheduling scheme. We resort to extended Kalman filteri...

متن کامل

Online Policy Improvement in Large POMDPs via an Error Minimization Search

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework for planning under uncertainty. However, most real world systems are modelled by huge POMDPs that cannot be solved due to their high complexity. To palliate to this difficulty, we propose combining existing offline approaches with an online search process, called AEMS, that can improve locally an appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Discrete Event Dynamic Systems

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2004